A Neural Networks Committee for the Contextual Bandit Problem
نویسندگان
چکیده
This paper presents a new contextual bandit algorithm, NeuralBandit, which does not need hypothesis on stationarity of contexts and rewards. Several neural networks are trained to modelize the value of rewards knowing the context. Two variants, based on multi-experts approach, are proposed to choose online the parameters of multi-layer perceptrons. The proposed algorithms are successfully tested on a large dataset with and without stationarity of rewards.
منابع مشابه
Contextual Multi-armed Bandits for the Prevention of Spam in VoIP Networks
In this paper we argue that contextual multi-armed bandit algorithms could open avenues for designing self-learning security modules for computer networks and related tasks. The paper has two contributions: a conceptual one and an algorithmical one. The conceptual contribution is to formulate – as an example – the real-world problem of preventing SPIT (Spam in VoIP networks), which is currently...
متن کاملar X iv : 1 20 1 . 61 81 v 2 [ cs . N I ] 1 2 Ju l 2 01 2 Contextual Multi - armed Bandits for the Prevention of Spam in VoIP Networks Technical Report
In this paper we argue that contextual multi-armed bandit algorithms could open avenues for designing self-learning security modules for computer networks and related tasks. The paper has two contributions: a conceptual one and an algorithmical one. The conceptual contribution is to formulate – as an example – the real-world problem of preventing SPIT (Spam in VoIP networks), which is currently...
متن کاملA committee machine approach for predicting permeability from well log data: a case study from a heterogeneous carbonate reservoir, Balal oil Field, Persian Gulf
Permeability prediction problem has been examined using several methods such as empirical formulas, regression analysis and intelligent systems especially neural networks and fuzzy logic. This study proposes an improved and novel model for predicting permeability from conventional well log data. The methodology is integration of empirical formulas, multiple regression and neuro-fuzzy in a commi...
متن کاملCustomized Nonlinear Bandits for Online Response Selection in Neural Conversation Models
Dialog response selection is an important step towards natural response generation in conversational agents. Existing work on neural conversational models mainly focuses on offline supervised learning using a large set of context-response pairs. In this paper, we focus on online learning of response selection in retrieval-based dialog systems. We propose a contextual multi-armed bandit model wi...
متن کاملCompatible Value Gradients for Reinforcement Learning of Continuous Deep Policies
This paper proposes GProp, a deep reinforcement learning algorithm for continuous policies with compatible function approximation. The algorithm is based on two innovations. Firstly, we present a temporal-difference based method for learning the gradient of the value-function. Secondly, we present the deviator-actor-critic (DAC) model, which comprises three neural networks that estimate the val...
متن کامل